LUKE를 이용한 한국어 자연어 처리: 개체명 인식, 개체 연결

민진우; 나승훈; 김현호; 김선훈; 강인호; Jinwoo Min; Seung-Hoon Na; Hyun-Ho Kim; Seon-Hoon Kim; Inho Kang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	LUKE를 이용한 한국어 자연어 처리: 개체명 인식, 개체 연결
영문제목(English Title)	LUKE for Korean Natural Language Processing: Named Entity Recognition and Entity Linking
저자(Author)	민진우 나승훈 김현호 김선훈 강인호 Jinwoo Min Seung-Hoon Na Hyun-Ho Kim Seon-Hoon Kim Inho Kang
원문수록처(Citation)	VOL 28 NO. 03 PP. 0175 ~ 0183 (2022. 03)
한글내용 (Korean Abstract)	BERT와 같은 트랜스포머 기반의 언어 모델은 대용량의 레이블이 없는 말뭉치를 자가 학습 방법을 통해 학습한 후 다양한 자연어 처리 응용 태스크에 적용하여 놀라운 성능 향상을 보였다. 이와 같은 언어 모델은 실세계 지식 정보를 표현할 수 없는 단점이 존재하고 이러한 문제를 해결하기 위해 언어 모델에 지식 베이스를 반영하려는 다양한 연구들이 수행되었다. 본 연구에서는 단어 시퀀스 이외에 엔티티 시퀀스와 임베딩을 정의하고 단어와 엔티티의 모든 시퀀스 쌍에 따라 별도의 쿼리 파라미터를 두고 셀프 어텐션을 수행하는 LUKE 모델을 한국어 위키피디아 상에서 학습한 후 엔티티 관련 태스크인 개체명 인식, 개체 연결에 적용하여 기존의 RoBERTa 기반 모델 대비 각각 0.5%p, 1.05%p의 성능 향상을 가져왔다.
영문내용 (English Abstract)	Transformer-based language models (LM) such as BERT trained from a large amount of unlabeled corpus using self-supervised learning methods have shown remarkable performance improvement on various natural language processing (NLP) application tasks. Despite the marked improvements, the classical pretrained language model has not directly incorporate external real-world knowledge bases such as a Wikipedia knowledge graph or triples. To inject the real-world knowledge bases to a pretrained language model, many studies towards “knowledge enhanced” pretrained language models have been conducted. Among them, LUKE attaches a sequence of entities to a sequence of original input tokens and performs entity-aware self-attention using entity embeddings, leading to noticeable improved results on entity-related tasks and the state-of-the-art performance in SQuAD dataset. In this paper, we present a Korean version of LUKE pretrained from a large amount of Korean Wikipedia corpus and show its application results on entity-related tasks of Korean. In particular, we newly propose a way of applying LUKE to the entity linking task which has not been explored in the previous works of using LUKE. Experiment results on both Korean named entity recognition and entity linking tasks show improvements over the RoBERTa-based models.
키워드(Keyword)
파일첨부	PDF 다운로드